Multi-Cluster Patterns for SRE Platform
This document describes patterns for managing multiple SRE platform clusters using Flux CD.
Architecture
Management Cluster (hub)
βββ Flux watches: sre-platform repo (main branch)
βββ Cluster definitions in: clusters/
β βββ clusters/dev/
β βββ clusters/staging/
β βββ clusters/production/
βββ Each cluster gets its own Flux Kustomization with path + patches
Workload Clusters (spokes)
βββ Each bootstrapped with Flux pointing to same repo
βββ Environment-specific overrides via Kustomize patches
βββ Shared platform services, per-cluster tenant configs
Directory Structure
clusters/
βββ base/ # Shared across all clusters
β βββ kustomization.yaml # Points to platform/core/
βββ dev/
β βββ kustomization.yaml # Patches for dev (fewer replicas, no persistence)
β βββ patches/
β βββ reduce-resources.yaml
βββ staging/
β βββ kustomization.yaml # Patches for staging
β βββ patches/
β βββ staging-domain.yaml
βββ production/
βββ kustomization.yaml # Patches for production (HA, persistence, real TLS)
βββ patches/
βββ ha-replicas.yaml
βββ production-domain.yaml
βββ real-tls-issuer.yaml
Flux Multi-Cluster Bootstrap
Option 1: Single Repo, Multiple Paths
Each cluster's Flux instance points to a different path in the same repo:
# On dev cluster
apiVersion: source.toolkit.fluxcd.io/v1
kind: GitRepository
metadata:
name: flux-system
namespace: flux-system
spec:
url: https://github.com/org/sre-platform.git
ref:
branch: main
---
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: flux-system
namespace: flux-system
spec:
path: ./clusters/dev
sourceRef:
kind: GitRepository
name: flux-system
Option 2: Branch-per-Environment
Each cluster's Flux watches a different branch:
spec:
ref:
branch: env/production # or env/staging, env/dev
Recommended: Option 1 with Kustomize Overlays
Use a single main branch with Kustomize overlays per cluster. This ensures:
- All clusters share the same base configs
- Differences are explicit in overlay patches
- PRs show exactly what changes per environment
- Promotion is a patch change, not a branch merge
Environment Promotion
Developer pushes image tag β CI pipeline runs β
β Updates apps/tenants/<team>/apps/<app>.yaml in dev overlay
β PR created for staging promotion
β After staging validation, PR for production
Cross-Cluster Service Mesh
For services that span clusters, use Istio multi-cluster:
# On each cluster, configure Istio for multi-cluster
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
meshConfig:
defaultConfig:
meshId: sre-mesh
trustDomain: cluster.local
values:
global:
meshID: sre-mesh
multiCluster:
clusterName: <cluster-name>
network: <network-name>
Secrets Across Clusters
Each cluster runs its own OpenBao instance. For shared secrets: 1. Use OpenBao replication (Enterprise feature) or 2. Use a central secrets source (e.g., AWS Secrets Manager) with ESO on each cluster 3. Use SOPS/Age encryption in Git (Flux native support)
Monitoring Across Clusters
Options: 1. Thanos: Sidecar on each cluster's Prometheus, central Thanos Query 2. Grafana Cloud: Remote write from each cluster 3. Victoria Metrics: Cluster-mode with global query layer
For the SRE platform, Thanos is recommended:
# Add to each cluster's monitoring HelmRelease
prometheus:
prometheusSpec:
thanos:
objectStorageConfig:
secretName: thanos-objstore-config